Unsloth AI releases an open-source no-code visual tool called Unsloth Studio, aiming to simplify the fine-tuning process of large language models and lower the development threshold. The tool achieves double the training speed and saves 70% of VRAM usage through a customized backpropagation kernel, without requiring complex environment configuration or high hardware costs.
Ant's LingBot-World, an open-source interactive world model, offers a high-fidelity virtual training environment for embodied AI and autonomous driving. It simulates physics for low-cost digital training and transfers learned causal behaviors to real-world applications, addressing data scarcity and high training costs.....
Allen AI's open-source SERA series simplifies AI programming with a training cost as low as $400. Its top model, SERA-32B, solves 54.2% of SWE-Bench tasks, outperforming similar open-source models and nearing industry-leading benchmarks.....
AWS launches Nova Forge and Nova Act at re:Invent 2025 to help businesses integrate proprietary knowledge into AI models, avoiding issues like fine-tuning closed models, performance degradation, or high costs of training from scratch.....
Google
$0.49
Input tokens/M
$2.1
Output tokens/M
1k
Context Length
Openai
$2.8
$11.2
Xai
$1.4
$3.5
2k
$7.7
$30.8
200
Anthropic
$7
$35
$17.5
$21
$105
$0.7
Alibaba
$6
$24
256
-
$2
Moonshot
$4
$16
Bytedance
$0.8
128
$0.15
$1.5
$10.5
Tencent
$1
32
Deepseek
$12
prithivMLmods
VibeThinker-1.5B is a 1.5-billion-parameter dense language model launched by Weibo AI. It is fine-tuned based on Qwen2.5-Math-1.5B and is specifically designed for mathematical and algorithmic coding problems. Trained using the 'Spectrum to Signal Principle' framework, it outperforms larger models in multiple math competition tests. The training cost is approximately $7,800, and it supports an output of up to about 40k tokens.
lmms-lab
LLaVA-OneVision-1.5 is a series of fully open-source large multimodal models that achieve advanced performance at a lower cost by training on native resolution images. This model demonstrates excellent performance in multiple multimodal benchmark tests, surpassing competitors such as Qwen2.5-VL.
facebook
MobileLLM-R1 is an efficient inference model in the MobileLLM series, specifically optimized for mathematics, programming, and scientific problems. It achieves higher accuracy with a smaller parameter scale, featuring low training cost and high efficiency.
ByteDance
ContentV is an efficient video generation model framework that achieves high-quality video generation with limited computing resources through a minimalist architecture, multi-stage training strategy, and cost-effective reinforcement learning framework.
FractalAIResearch
Fathom-R1-14B is a project based on the R1-distilled-14B model, achieving o4-mini level mathematical reasoning ability under a 16K context with a low training cost of $499.
qihoo360
Light-R1-32B is a math competition-specific model trained based on Qwen2.5-32B-Instruct, achieving performance surpassing DeepSeek-R1-Distill through curriculum-style SFT and DPO techniques, with a training cost of only $1000.
nvidia
Minitron-8B-Base is a large language model obtained by pruning Nemotron-4 15B, employing distillation and continuous training methods, saving 40 times the training tokens and 1.8 times the computational cost compared to training from scratch.
apple
TiC-CLIP is an improved vision-language model based on OpenCLIP, employing continual training strategies on time-series data to effectively reduce computational costs for model updates.
TiC-CLIP is a continually trained vision-language model that maintains synchronization with the latest data through temporally continuous training, avoiding the high costs of frequent retraining.
kadirnar
YOLOv10 is an efficient real-time object detection model without extra training costs. By optimizing architecture and training strategies, it improves detection accuracy while maintaining real-time performance.
jetmoe
JetMoE-8B is an efficient open-source large language model that surpasses LLaMA2-7B performance with a low training cost of $100,000, activating only 2.2 billion parameters during inference
JetMoE-8B is an efficient open-source large language model that achieves performance comparable to LLaMA2-7B with a training cost of under $100,000, specifically designed for low-resource environments.
PixArt-alpha
Pixart-α is an efficient text-to-image generation model based on Transformer architecture, capable of generating high-quality 1024-pixel images at extremely low training costs
Yukang
LongLoRA is an efficient fine-tuning method for large language models that can extend the context length of pre-trained models at a limited computational cost. This method significantly reduces the computational requirements for long-context training while maintaining model performance through shifted short attention and improved LoRA technology.
A comprehensive MCP server that provides machine learning model training, fine-tuning, and experiment management functions, supporting multi-backend training, cloud GPU deployment, and cost estimation.